Guided Project: Visualizing Pixar's Roller Coaster

Posted on Wed 08 July 2015 in Projects

Introduction to the data

In [36]:
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
%matplotlib inline

pixar_movies = pd.read_csv("PixarMovies.csv")
# Number of rows
print(pixar_movies.shape[0])
15
In [37]:
# Number of columns
print(pixar_movies.shape[1])
16
In [38]:
pixar_movies.head(15)
Out[38]:
Year Released Movie Length RT Score IMDB Score Metacritic Score Opening Weekend Worldwide Gross Domestic Gross Adjusted Domestic Gross International Gross Domestic % International % Production Budget Oscars Nominated Oscars Won
0 1995 Toy Story 81 100 8.3 92 29.14 362.0 191.8 356.21 170.2 52.98% 47.02% 30 3 0
1 1998 A Bug's Life 96 92 7.2 77 33.26 363.4 162.8 277.18 200.6 44.80% 55.20% 45 1 0
2 1999 Toy Story 2 92 100 7.9 88 57.39 485.0 245.9 388.43 239.2 50.70% 49.32% 90 1 0
3 2001 Monsters, Inc. 90 96 8.1 78 62.58 528.8 255.9 366.12 272.9 48.39% 51.61% 115 3 1
4 2003 Finding Nemo 104 99 8.2 90 70.25 895.6 339.7 457.46 555.9 37.93% 62.07% 94 4 1
5 2004 The Incredibles 115 97 8.0 90 70.47 631.4 261.4 341.28 370.0 41.40% 58.60% 92 4 2
6 2006 Cars 116 74 7.2 73 60.12 462.0 244.1 302.59 217.9 52.84% 47.16% 70 2 0
7 2007 Ratatouille 111 96 8.0 96 47.00 623.7 206.4 243.65 417.3 33.09% 66.91% 150 5 1
8 2008 WALL-E 97 96 8.4 94 63.10 521.3 223.8 253.11 297.5 42.93% 57.07% 180 6 1
9 2009 Up 96 98 8.3 88 68.11 731.3 293.0 318.90 438.3 40.07% 59.93% 175 5 2
10 2010 Toy Story 3 103 99 8.4 92 110.31 1063.2 415.0 423.88 648.2 39.03% 60.97% 200 5 2
11 2011 Cars 2 113 39 6.3 57 109.00 559.9 191.5 194.43 368.4 34.20% 65.80% 200 0 0
12 2012 Brave 100 78 7.2 69 66.30 539.0 237.3 243.39 301.7 44.03% 55.97% 185 1 1
13 2013 Monsters University 107 78 7.4 65 82.43 743.6 268.5 269.59 475.1 36.11% 63.89% 200 0 0
14 2015 Inside Out 102 98 8.8 93 90.40 677.1 340.5 340.50 336.6 50.29% 49.71% 175 NaN NaN
In [22]:
pixar_movies.dtypes
Out[22]:
Year Released                int64
Movie                       object
Length                       int64
RT Score                     int64
IMDB Score                 float64
Metacritic Score             int64
Opening Weekend            float64
Worldwide Gross            float64
Domestic Gross             float64
Adjusted Domestic Gross    float64
International Gross        float64
Domestic %                  object
International %             object
Production Budget            int64
Oscars Nominated           float64
Oscars Won                 float64
dtype: object

Data cleaning

In [39]:
# Use the `str` attribute followed by the string method `rstrip()` to apply the string method 
# to every value in the column.  Use the `astype()` method to cast the column to the float data type.
pixar_movies["Domestic %"] = pixar_movies["Domestic %"].str.rstrip("%").astype("float")
pixar_movies["International %"] = pixar_movies["International %"].str.rstrip("%").astype("float")
In [40]:
# Multiply the `IMDB Score` column by 10
pixar_movies["IMDB Score"] = pixar_movies["IMDB Score"]*10
In [41]:
# Create a new DataFrame containing only the first 14 rows.
filtered_pixar = pixar_movies.loc[0:13]
In [44]:
# Set the `Movie` column as the index for the DataFrame.
pixar_movies.set_index("Movie", inplace=True)
filtered_pixar.set_index("Movie", inplace=True)
In [45]:
pixar_movies
Out[45]:
Year Released Length RT Score IMDB Score Metacritic Score Opening Weekend Worldwide Gross Domestic Gross Adjusted Domestic Gross International Gross Domestic % International % Production Budget Oscars Nominated Oscars Won
Movie
Toy Story 1995 81 100 83 92 29.14 362.0 191.8 356.21 170.2 52.98 47.02 30 3 0
A Bug's Life 1998 96 92 72 77 33.26 363.4 162.8 277.18 200.6 44.80 55.20 45 1 0
Toy Story 2 1999 92 100 79 88 57.39 485.0 245.9 388.43 239.2 50.70 49.32 90 1 0
Monsters, Inc. 2001 90 96 81 78 62.58 528.8 255.9 366.12 272.9 48.39 51.61 115 3 1
Finding Nemo 2003 104 99 82 90 70.25 895.6 339.7 457.46 555.9 37.93 62.07 94 4 1
The Incredibles 2004 115 97 80 90 70.47 631.4 261.4 341.28 370.0 41.40 58.60 92 4 2
Cars 2006 116 74 72 73 60.12 462.0 244.1 302.59 217.9 52.84 47.16 70 2 0
Ratatouille 2007 111 96 80 96 47.00 623.7 206.4 243.65 417.3 33.09 66.91 150 5 1
WALL-E 2008 97 96 84 94 63.10 521.3 223.8 253.11 297.5 42.93 57.07 180 6 1
Up 2009 96 98 83 88 68.11 731.3 293.0 318.90 438.3 40.07 59.93 175 5 2
Toy Story 3 2010 103 99 84 92 110.31 1063.2 415.0 423.88 648.2 39.03 60.97 200 5 2
Cars 2 2011 113 39 63 57 109.00 559.9 191.5 194.43 368.4 34.20 65.80 200 0 0
Brave 2012 100 78 72 69 66.30 539.0 237.3 243.39 301.7 44.03 55.97 185 1 1
Monsters University 2013 107 78 74 65 82.43 743.6 268.5 269.59 475.1 36.11 63.89 200 0 0
Inside Out 2015 102 98 88 93 90.40 677.1 340.5 340.50 336.6 50.29 49.71 175 NaN NaN

Data visualization, line plots

In [59]:
critics_reviews = pixar_movies[["RT Score","Metacritic Score","IMDB Score"]]
critics_reviews.plot()
Out[59]:
<matplotlib.axes._subplots.AxesSubplot at 0x10eca6550>
In [60]:
critics_reviews.plot(figsize=(10,6))
Out[60]:
<matplotlib.axes._subplots.AxesSubplot at 0x10eb36630>

Data visualization, box plot

In [66]:
pixar_movies[["RT Score","Metacritic Score","IMDB Score"]].plot(kind="box")
Out[66]:
<matplotlib.axes._subplots.AxesSubplot at 0x10f734828>
In [68]:
pixar_movies[["RT Score","Metacritic Score","IMDB Score"]].plot(kind="box", figsize=(9,5))
Out[68]:
<matplotlib.axes._subplots.AxesSubplot at 0x10f5a7cf8>

Data visualization, stacked bar plots

In [96]:
revenue_proportions = filtered_pixar[["Domestic %", "International %"]]
revenue_proportions.plot(kind='bar', stacked=True, figsize=(7,6))
Out[96]:
<matplotlib.axes._subplots.AxesSubplot at 0x10e9cd080>

Next steps

In [65]:
# Grouped bar plot for oscar nominations / oscars won
filtered_pixar[["Oscars Nominated","Oscars Won"]].plot(kind='bar', figsize=(10,6))
Out[65]:
<matplotlib.axes._subplots.AxesSubplot at 0x10f26c400>